Skip to content

Fix cuda_bindings conjugate_gradient_multi_block_cg.py example#1922

Draft
rwgk wants to merge 2 commits intoNVIDIA:mainfrom
rwgk:fix_conjugate_gradient_multi_block_cg
Draft

Fix cuda_bindings conjugate_gradient_multi_block_cg.py example#1922
rwgk wants to merge 2 commits intoNVIDIA:mainfrom
rwgk:fix_conjugate_gradient_multi_block_cg

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Apr 16, 2026

This PR started out as a humble attempt to eliminate the only remaining pytest.skip() in our examples (similar to #1861), but it turned into a bug-fix pass once the example actually began running.

  • Enable cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py by removing its unconditional NVRTC waiver (which dates all the way back to the initial cuda-python commit 4a36f83) and replacing standalone pytest.skip() calls with requirement_not_met()
  • Simplify the platform gating in that example
  • Fix the Python-side/runtime issues that had been hidden behind the waiver:
    • invalid C-style %d formatting inside an f-string
    • gen_tridiag() variable shadowing that broke CSR construction
    • cooperative kernel launch not passing the computed dynamic shared memory size
    • managed-memory pointer variables being overwritten by loop indices before kernel launch and cleanup
    • residual access after freeing dot_result, which caused a teardown segfault
  • Drive-by: fix the separate QNX gate in cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py by checking platform.system() == "QNX" instead of platform.machine() == "qnx"— QNX is an operating system rather than a machine architecture, so checking platform.machine() == "qnx" does not detect QNX hosts.

rwgk added 2 commits April 15, 2026 16:54
Remove the unconditional NVRTC waiver from the renamed example so CI can exercise its real execution path again. While re-enabling it, replace the standalone pytest.skip() checks with requirement_not_met() and simplify the platform gating.

The waived code path had been hiding several Python-side runtime bugs that required these fixes:
- replaced the invalid C-style %d f-string formatting
- fixed gen_tridiag() variable shadowing so the CSR row-offset array is actually populated
- passed the computed dynamic shared-memory size into cuLaunchCooperativeKernel() and made that size integer-valued
- stopped overwriting managed-memory pointer variables with loop indices before kernel launch and cleanup
- cached the residual before freeing dot_result, which removed the teardown segfault

Made-with: Cursor
QNX is an operating system rather than a machine architecture, so checking platform.machine() can miss the requirement_not_met() path on QNX hosts. Use platform.system() so the example is waived consistently on that platform.

Made-with: Cursor
@rwgk rwgk added this to the cuda.bindings next milestone Apr 16, 2026
@rwgk rwgk self-assigned this Apr 16, 2026
@rwgk rwgk added bug Something isn't working P1 Medium priority - Should do cuda.bindings Everything related to the cuda.bindings module labels Apr 16, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 16, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 16, 2026

/ok to test

@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.bindings Everything related to the cuda.bindings module P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant